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Abstract 

The labeled stochastic block model is a random graph model representing networks with 
community structure and interactions of multiple types. In its simplest form, it consists of two 
communities of approximately equal size, and the edges are drawn and labeled at random with 
probability depending on whether their two endpoints belong to the same community or not. 

It has been conjectured in [16] that correlated reconstruction (i.e. identification of a partition 
correlated with the true partition into the underlying communities) would be feasible if and 
only if a model parameter exceeds a threshold. We prove one half of this conjecture, i.e., 
reconstruction is impossible when below the threshold. In the positive direction, we introduce 
a weighted graph to exploit the label information. With a suitable choice of weight function, 
we show that when above the threshold by a specific constant, reconstruction is achieved by 
(1) minimum bisection, (2) a semidefinite relaxation of minimum bisection, and (3) a spectral 
method combined with removal of edges incident to vertices of high degree. Furthermore, 
we show that hypothesis testing between the labeled stochastic block model and the labeled 
Erdos-Renyi random graph model exhibits a phase transition at the conjectured reconstruction 
threshold. 


1 Introduction 

1.1 Motivation 

Community detection aims to identify underlying communities of similar characteristics in an overall 
population from the observation of pairwise interactions between individuals [12, 24, 23]. The 
stochastic block model, also known as planted partition model , is a popular random graph model 
for analyzing the community detection problem [25, 28, 2, 27, 9], in which pairwise interactions 
are binary: an edge is either present or absent between two individuals. In its simplest form, 
the stochastic block model consists of two communities of approximately equal size, where the 
within-community edge is present at random with probability p\ while the across-community edge 
is present with probability q. If p > q, it corresponds to assortative communities where interactions 
are more likely within rather than across communities; while p < q corresponds to disassortative 
communities. 

In practice, interactions can be of various types and these types reveal more information on the 
underlying communities than the mere existence of the interaction itself. For example, in recom- 
rnender systems, interactions between users and items come with user ratings. Such ratings contain 
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far more information than the interaction itself to characterize the user and item types. Similarly, 
protein-protein chemical interactions in biological networks can be exothermic and endothermic; 
email exchanges in a club may be formal or informal; friendship in social networks may be strong or 
weak. The labeled stochastic block model was recently proposed in [16] to capture rich interaction 
types. In this model interaction types are described by labels drawn from an arbitrary collection. 
In particular, for the simple two communities case, the within-community edge is labeled at ran¬ 
dom with distribution /r; while the across-community edge is labeled with a different distribution 
v. In this context an important question is how to leverage the labeling information for detecting 
underlying communities. 

1.2 Information-Scarce Regime 

In this paper, we focus on the sparse labeled stochastic block model in which every vertex has a 
limited average degree, i.e., p,q = 0(l/n), where n is the number of vertices. It corresponds to the 
information-scarce regime where only O(n) edges and labels are observed in total 1 . This regime is 
of practical interest, arising in several contexts. For example, in recommender systems, users only 
give ratings to few items; in biological networks, only few protein-protein interactions are observed 
due to cost constraints; in social networks, a person only has a limited number of friends. 

For the stochastic block model in this information-scarce regime, there are Q(n ) isolated ver¬ 
tices, as in Erdos-Renyi random graphs with bounded average degree. For isolated vertices, it is 
impossible to determine their community membership and thus exact reconstruction of communi¬ 
ties is impossible. Therefore, we resort to finding a partition into communities positively correlated 
to the true community partition (see Definition 1 below). 

1.3 Main Results 

Focusing on the two communities scenario, we show that a positively correlated reconstruction is 
fundamentally impossible when below a threshold. This establishes one half of the conjecture in 
[16]. In the positive direction, we establish the following results. We introduce a graph weighted 
by a suitable function of observed labels, on which we show that: 

(1) Minimum bisection gives a positively correlated partition when above the threshold by a 
factor of 64 In 2. 

(2) A semidefinite relaxation of minimum bisection gives a positively correlated partition when 
above the threshold by a factor of 2 1 ' In 2. 

(3) A spectral method combined with removal of edges incident to vertices of high degree gives 
a positively correlated partition when above the threshold by a constant factor. 

Furthermore, we show that the labeled stochastic block model is contiguous to a labeled Erdos- 
Renyi random graph when below the reconstruction threshold and orthogonal to it when above 
the threshold. It implies that for the hypothesis testing problem between the labeled stochastic 
block model and the labeled Erdos-Renyi random graph model, the correct identification of the 
underlying distribution is feasible if and only if above the reconstruction threshold. It also implies 
that there is no consistent estimator for model parameters when below the reconstruction threshold. 

1.4 Related Work 

For the stochastic block model, most previous work focuses on the “dense” regime with an average 
degree diverging as the size of the graph n grows, (see, e.g., [4, 5] and the references therein). 

1 We also provide results for p,q = 0(polylog(n)/n) in Theorem 4. 
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For the “sparse” regime with bounded average degrees, a sharp phase transition threshold for 
reconstruction was conjectured in [9] by analyzing the belief propagation algorithm. The converse 
part of the conjecture was rigorously proved in [22], The achievability part is proved independently 
in [21, 19]. In addition, it is shown in [6] that a variant of spectral method gives a positively 
correlated partition when above the threshold by an unknown constant factor. More recently, it is 
shown in [15] that a semidehnite program finds a correlated partition when above the threshold by 
some large constant factor. 

The labeled stochastic block was first proposed and studied in [16] and a new reconstruction 
threshold that incorporates the extra labeling information was conjectured. Simulations further 
indicate that the belief propagation algorithm works when above the threshold, but reconstruction 
algorithms that provably work are still unknown. 

Finally, we recently became aware of the work [1] that studies the problem of decoding binary 
node labels from noisy edge measurements. In the case where the background graph is Erdos- 
Renyi random graph and each node label is independently and uniformly chosen from {±1}, the 
model in [1] can be viewed as a special case of the labeled stochastic block model with p = q, 
p = (1 — e)<5_|_i + eS- 1 and v = eh+i + (1 — e)<5_i, where 5 X denotes the probability measure 
concentrated on point x (See Section 2 for the formal model description). When p = q = alogn/n 
for some constant a and e —> 1/2, it is shown in [1] that exact recovery of node labels is possible if 
and only if a(l — 2e) 2 > 2. In contrast, our results show that when p = q = a/n for some constant 
a, correlated recovery of node labels is impossible if a(l — 2e) 2 < 1 for any 0 < e < 1. Moreover, 
we show that distinguishing hypothesis e = eo and hypothesis e = 1/2 is possible if and only if 
a(l — 2e 0 ) 2 > 1. 

1.5 Outline 

Section 2 introduces the precise definition of the labeled stochastic block model to be studied and 
the key notations. The main theorems are introduced and briefly discussed in Section 3. The 
detailed proofs are presented in Section 4. Section 5 ends the paper with concluding remarks. 
Miscellaneous details and proofs are in the Appendix. 


2 Model and Notation 


This section formally defines the labeled stochastic block model with two symmetric communities 
and introduces the key notations and definitions used in the paper. Let £ denote a finite set. 
The labeled stochastic block model Q{n,p, q , p, u) is a random graph with n vertices of {±1} types 
indexed by [n] and {£ £ £}-labeled edges. To generate a particular realization ( G,L,a ), first 
assign type a u £ {±1} to each vertex u uniformly and independently at random. Then, for every 
vertex pair ( u , v ), independently of everything else, draw an edge between u and v with probability 
p if a u = cr v and with probability q otherwise. Finally, every edge e = ( u , v ) is labeled with £ 
independently at random with probability p{l) if a u = a v and with probability i/(£) otherwise. 

Equivalently, we can specify Q{n,p,q, p,v) by its probability distribution. Let 


{G, L, c) 


pp(yL uv ) 
qv(L uv ) 
l — p 

. 1-9 


if cr u = a v ,{u,v) £ E(G), 
if cr u / a v , ( u,v ) £ E(G), 
if <j u = a v , (u,v) E(G), 

if a u / a v , (u,v) £ E(G), 
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where E(G) is the set of edges of G and L uv is the label on the edge (u,v). Then, 


F n (G,L,a) = 2~ n J] <j) uv (G,L,cr). 

(u,v):u<v 


(i) 


When fj, = v, it reduces to the classical stochastic block model without labels. This paper focuses 
on the sparse case where p = a/n and q = b/n for two fixed constants a and b, and the goal is 
to reconstruct the true underlying types of vertices a by observing the graph structure G and the 
labels on edges L. 

It is known that in the sparse graph, there are 0(n) isolated vertices whose types clearly cannot 
be recovered accurately. Therefore, our goal is to reconstruct a type assignment which is positively 
correlated to the true type assignment. More formally, we adopt the following definition. 

Definition 1. A type assignment a is said to be positively correlated with the true type assignment 
a if a.a.s. 

Q{&, £) := x - - min{d(<7, a), d(a, -a)} > 0, (2) 

2 n 

where d is the Hamming distance, and Q is called the Overlap. 


The shorthand a.a.s. denotes asymptotically almost surely. A sequence of events A n holds a.a.s. 
if the probability of A n converges to 1 as n —> oo. Define r as 


r = 


a + b ap(&) + bv{l) 
9 


l&C 


a + b 


ap{€) — bv(l) V 
a/j,(£) + bu(i)) 


(3) 


It was conjectured in [16] that r is the threshold for positively correlated reconstruction. 

Conjecture 1. (i) If r > 1, then it is possible to find a type assignment correlated with the 

true assignement a.a.s. 

(ii) If t < 1, then it is impossible to find a type assignment correlated with the true assignement 

cl.cl.S. 


In this paper, we prove (ii) and propose three different algorithms able to find a type assignment 
correlated with the true assignment for r big enough. 


Notation Let A denote the adjacency matrix of the graph G, I denote the identity matrix, and 
J denote the all-one matrix. We write X + 0 if X is positive semidefinite and X > 0 if all the 
entries of X are non-negative. For any matrix Y, let ||Yj| denote its spectral norm. For any positive 
integer n, let [n] = {1,..., n}. For any set T C [n], let |T| denote its cardinality and T c denote its 
complement. We use standard big O notations, e.g., for any sequences {a n } and {b n }, a n = @(b n ) 
or a n x b n if there is an absolute constant c > 0 such that 1/c < a n /b n < c. Let Bern(p) denote 
the Bernoulli distribution with mean p and Binom(IV,p) denote the binomial distribution with N 
trials and success probability p. All logarithms are natural and we use the convention OlogO = 0. 
For a vector x E M n , sign(x) gives the sign of x componentwise, and ||x|| denotes the L 2 norm. For 
a graph G, let V(G) denote its vertex set and E{G) denote its edge set. 
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3 Main Theorems 


3.1 Minimum Bisection 


To recover the community partition, one approach is via the maximum likelihood estimation. In 
view of (1), the log-likelihood function can be written as: 


logP(G, L\a) = 


+ 


1 

2 

1 

2 


£ 

(u,v)£E(G) 


£ 

{u,v)£E(G) 


log 


ajALv. 

bv(L u 


-(Tnid'i 


-flog 


ab IT 
_2 


n 


)v{L u 



(l — a/n\ 

\ 1 — b/n J 


a u a v -f log ((1 


a/n)(l — b/n)) 


Under the constraint a u = 0, the maximum likelihood estimation is equivalent to 


max 

cr 


s.t. 


E lo § 

(u,v)£E(G) 


a( 1 - b/n)n(L uv ) 
6(1 - a/n)v(L uv ) 


A,i 




V 


= o, cj e {±i} n . 


This is equivalent to the minimum bisection on the weighted graph with a specific weight function 
w(i) = log j • For a general weighing function w : C — > [—1,1], the minimum bisection 

finds a balanced bipartite subgraph in G with the minimum weighted cut, i.e., 


min W uv 

cr z ' 

(u,v):a u ^o- v 

S.t. ^(Tu = 0, a u e {±1}, (4) 

u 


where W uv = A uv w(L uv ) and A is the adjacency matrix of G. 

Theorem 1. Assume the technical condition: Y/,e apL{£)w 2 {£), Y/t bo(£)w 2 (£) > 8 In 2. Then if 


Ei(°/*(i) - bv(e))w{e) ^/ 1281n2 
VEMtW + bis(£))w 2 (£) 


( 5 ) 


a.a.s. solutions of the minimum bisection (4) are positively correlated to the true type assignment 
cr*. Moreover, the left hand side of (5) is maximized when w(£) = 

aMQ+M£) ’ in which CaSe 

reduces to r > 64 In 2. 


3.2 Semidefinite relaxation method 

The minimum bisection is known to be NP-hard in the worst case [14, Theorem 1.3]. In this section, 
we present a semidefinite relaxation of the minimum bisection (4) which is solvable in polynomial 
time, and show it finds an assignment correlated with the true assignment provided r is large 
enough. Let Y = aa T . Then cr u = ±1 is equivalent to Y uu = 1, and Y^ u a u = 0 if and only if 
(Y, J) = 0. Therefore, (4) can be recast as 

max (IT, Y) 

Y,a 

s.t. Y = aa T 

Y uu = 1, u e [n] 

(J) Y) = 0. (6) 
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Notice that the matrix Y = acr T is a rank-one positive semidefinite matrix. If we relax this 
condition by dropping the rank-one restriction, we obtain the following semidefinite relaxation of 
( 6 ): 

ksDP = argmax (W,Y) 

Y 

s.t. Y Y 0 

Y uu = 1, u E [n] 

<J,Y> = 0. (7) 


To get an estimator of the type assignment from Ysdp> let y denote an eigenvector of Ysdp corre¬ 
sponding to the largest eigenvalue and ||y|| = y/n. The following result shows that ctsdp — sign(y) 
is positively correlated with the true type assignment. 

Theorem 2. Assume the technical condition: w 2 (£)(ay(£) + bv{If) > 81n2. If 


- HOMO > 512 ^ 

VT,e( a ^) + bv(£))iu 2 (£) 


( 8 ) 


then a.a.s. ctsdp is positively correlated to the true type assignment a*. Moreover, the left hand 
side of (8) is maximized when w{£) = , in which case (8) reduces to r > 2 1 ' ln2. 

In the stochastic block model without labels, i.e., y = u, condition (8) reduces to (a — b ) 2 > 
2 18 ln2(a + 6); similar conditions with a different constant have been proved in [15, Theorem 1.1] 
using the Grothendieck’s inequality. Our proof builds upon the analysis in [15]. 


3.3 Spectral Method 

In this section, we present a polynomial-time spectral algorithm based on the weighted adjacency 
matrix W and show that this algorithm allows us to find an assignment correlated with the true 
assignment provided r is large enough. 

Note that E[W|cr] = ^crcr T — with 

a = \ + bv(£)), 

l 

p = \'52 w ( £ )( a f i (t)- bi/ ( £ ))- ( 9 ) 

i 

The term j s irrelevant to the main results (thanks to Weyl’s perturbation theorem) and 

neglected for simplicity. Let D = W — ^ J and then E[D|cr] = ^<j cr T has rank one with singular value 
f). Hence, it makes sense to define D as the best rank-1 approximation of the matrix D. In other 
words, if D = Yli v i x i x J the eigenvalue decomposition of D with eigenvalues |ui| > \v 2 \ > ..., 
we define D = v\x\xl. Then if the matrix D is close to its mean E[D|<t] in the spectral norm, 
we expect v\ to be close to /?, and sign(xi) to be correlated with a. Unfortunately, in the sparse 
regime, there are vertices of degree ^( i Q ^g n ) an d thus the largest singular value of W could reach 

U(^/ „ ) which is much higher than (5. In order to take care of the issue, we begin with a 
preliminary step to clean the spectrum of W: we remove all edges incident to vertices in the graph 
with degree larger than | To summarize, for a given weight function w(£), our algorithm 
Spectral — Reconstruction has the following structure: 
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1. Remove edges incident to vertices with degree larger than | and let G' denote the resulting 
graph. Define W' to be the weighted adjacency matrix of G'. 

2. Let x be the left-singular vector associated with the largest singular value of D' = W' — ^J, 
i.e., 

x = argmax{|x T D / x|, ||x|| = 1}. (10) 

Output sign(x) for the types of the vertices. 

Observe that (10) can be seen as a (non-convex) relaxation of the minimum bisection (4) by 
replacing the integer constraint with the unit-norm constraint and relaxing the constraint a u = 
0 to be a regularized term ^x T Jx in the objective function. Spectral — Reconstruction needs 
estimates of a and a + b, which can be well approximated by ^1 t IT 1 and ^1 T A1, respectively. 
To simplify the analysis, we will assume that the exact values of a and a + b are known. 


Theorem 3. Assume a > b > Cq for some sufficiently large constant Cq. There exists a universal 
constant C (i.e. not depending on a, b, p or v) such that if fi 2 > C(a + b), where fd is defined 
in (9), then a.a.s. Spectral — Reconstruction outputs a type assignment correlated with the true 
assignment. In the particular case, where w{£) = , the condition fi 2 > C{a + b ) reduces 

to t > y/C(a + b). 

In the stochastic block model without labels, letting w(£) = 1, condition fi 2 > C(a + b) reduces 
to ( a — b) 2 > 4C(a + b); the sharp condition (a — b ) 2 > 2 (a + b) has been proved recently in [21, 19]. 
Compared to point (i) in the Conjecture 1, our result does not give the right order of magnitude 
when a and b are large. Indeed, we are able to improve it if we allow a and b to grow with n. 

Theorem 4. Assume that min(a, b) = D(log 6 n). If 


Ki(aMl) ~ bv(£))w(£)] 2 
E^( a A fi£) + bn(£))w 2 (£) 


( 11 ) 


then Spectral — Reconstruction outputs a type assigmnent correlated with the true assignment a.a.s. 
Moreover, the left hand side of (11) is maximized when w{£) = , in which case (11) reduces 

to r > 128. With this choice of w{£), as soon as r —>• oo, Spectral — Reconstruction outputs the 
true assignment for all vertices except o(n) a.a.s. 

Note that in the regime min(a, b) = D(log 6 n), the degrees are very concentrated and step 
1) of the algorithm can be removed without harm. The simulation results, depicted in Fig. 1, 
further indicate that Spectral — Reconstruction leaving out step 1) outputs a positively correlated 
assignment when above the threshold. In the simulation, we assume for simplicity only two labels: 
r and b, and define p,{r) = 0.5 + e and v(r) = 0.5 — e. We generate the graph from the labeled 
stochastic block model with n = 1000 vertices for various a, 6, e. Fix a, b, we plot the overlap Q 
against e and indicate the threshold r = 1 as a vertical dash line. All plotted values are averages 
over 100 trials. 

Note that our algorithm is most efficient when the parameters (a, b, /x and u) of the model are 
known as the optimal weight function depends on these parameters. In the case where the labels 
are uninformative, i.e. p, = u, our algorithm is very simple, does not require to know the values a 
and b, and in the range of Theorem 4, has the best known performance guarantee (see [4, Table I]). 
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Figure 1: The overlap Q against e from 0.05 to 0.5. 

3.4 Converse Result 

This section proves part (ii) of Conjecture 1. In particular, we show that when r < 1, asymptotically 
it is impossible to tell whether any two vertices are more likely to belong to the same community. 
It further implies that reconstructing a positively correlated type assignment is fundamentally 
impossible. 

Theorem 5. If r < 1, then for any fixed vertices p and v, 

IPn(°"p = +1|G, L, a v = +1) —> 1/2 a.a.s. (12) 

Remark 1. Reconstructing a positively correlated type assignment is harder than telling whether 
any two vertices are more likely to belong to the same community. In particular, given a positively 
correlated type assignment cr, for two vertices randomly chosen, they are more likely to belong to 
the same community if they have the same type in a. 

Theorem 5 is related to the Ising spin model in the statistical physics [10, 20], and it essentially 
says that there is no long range correlation in the type assignment when r < 1. The main idea in 
the proof of Theorem 5 is borrowed from [22] and works as follows: (1) pick any two fixed vertices 
p, v and consider the local neighborhood of p up to distance 0(log(n)). The vertex v lies outside 
of the local neighborhood of p a.a.s.. (2) conditional on the type assignment at the boundary of 
the local neighborhood, a p is asymptotically independent with a v . (3) the local neighborhood of 
p looks like a Markov process on a labeled Galton-Watson tree rooted at p. (4) For the Markov 
process on the labeled Galton-Watson tree, the types of leaves provide no information about the 
type of the root p when the depth of tree goes to infinity. 

3.5 Hypothesis Testing 

Consider a labeled Erdos-Renyi random graph Q(n, 2^), where independently at random, each 
pair of two vertices is connected with probability and every edge is labeled with I € L with 
probability _ Let P/ denote the distribution of the labeled Erdos-Renyi random graph. 
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Given a graph ( G,L ) which was drawn from either P„ or P^, an interesting hypothesis testing 
problem is to decide which one is the underlying distribution of (G,L)1 It turns out that when 
r > 1 , the correct identification of the underlying distribution is feasible a.a.s.; however, when 
t < 1 , one is bound to make error with non-vanishing probability. 

Theorem 6. If t > 1 , then P n and P' n are asymptotically orthogonal, i.e., there exists event A n 
such that P n (A n ) —> 1 and P ' n (A n ) —> 0. 

If t < 1, then P ri and P/ are contiguous, i.e., for every sequence of event A n , 

lim P n (A n ) = 0 lim P^(4 n ) = 0 . 

n—> oo n—>oo 

Theorem 6 further implies the following corollary regarding the model parameter estimation. 

Corollary 1. If t < 1, then there is no consistent estimator for parameters a,b,p,v. 

Proof. The second part of Theorem 6 implies that Q(n, fii, v\) and Q{n, // 2 , ^ 2 ) are 

contiguous as long as a\n\{t) + bii , i(£) = a 2 /X 2 (£) + 62 ^ 2 ^) and 

\ - (aitM{£) - bjVjfl)) 2 

2 2 {anM(d) + bm{£)) 

for i = 1,2. Therefore, one cannot distinguish between Q{n, and Q(n , ^,^ 2 ,^ 2 ) 

with the success probability converging to 1 , and thus there is no consistent estimator for parameters 
a,b,fi,v. □ 

In the special case where p. = v, i.e., no labeling information is available, Theorem 6 reduces to 
Theorem 2.4 in [22]. The positive part of Theorem 6 is proved by counting the number of labeled 
short cycles and the second moment method. The negative part of Theorem 6 is proved using 
the small subgraph conditioning method as introduced in [22]. The small subgraph conditioning 
method was originally developed to show that random d-regular graphs are Hamiltonian a.s.s. 
[26, 17], 

4 Proofs 

4.1 Proof of Theorem 1 

Recall that a* denotes the true type assignment. Since |{n : < 7 * = 1}| ~ Binom(n, 1/2), by Chernoff 
bound, a.a.s., 


|{n : a* = 1}| G n/2 — y/ n log n, n/2 + y/ n log n 


(13) 


For ease of presentation, assume |{w : a* = 1}| = n/2. Let m(a) = |{u : a u = +1,<7* = — 1}| 
and e > 0 be an arbitrarily small constant. To prove the theorem, by the definition of positively 
correlated reconstruction, it suffices to show that for all a with ^(1 — e) < m{a) < f. 

Y W nv- Y W ™ ■=Yl(a)-Y 2 (<T) >0. 

(u,v):a u ^cr v , ( u,v):a u =a v , 

a u =a v (J uI (r v 
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To ease the notation, we suppress the argument a. Observe that Y\ is a sum of 2m[n/2 — m) i.i.d. 
random variables whose value is w(£) with probability ^y(£)', I 2 is a sum of 2m(n/2 — m) i.i.d. 
random variables whose value is w(£) with probability -i/(£). Thus, 

y\ := E[Yi] = 2m(n/2 — m)(a/n ) J2^ w (£), 

t 

Tj 2 := E[y 2 ] = 2m(n/2 — m)(b/n) ^ u(£)w(£). 

e 

Define 


z\ := 2m{n/2 — m){a/n) y y(£)w 2 (i), 

t 

Z 2 := 2m(n/2 — m)(b/n ) E 

1 

Then, for 0 < A < 

E[exp(-AFi)] = 


i + -£(< 

n L ^ 


-\w(l) _ 


1U£) 


2m(n/2—m) 


< exp 


< exp 


2m(n/2 — m) SB e -\wW _ 


2m(n/2 — m) SE (-Aw(£) + 2A 2 w 2 (£)) y(£) 


= exp(—Ayi + 2A 2 zi), 


where the first inequality follows from the fact that 1 + x < e x and the second one follows from the 
fact that e x < 1 + x + 2x 2 for |cc| < 1/2. The Chernoff bound gives that for 0 < A < 

T(Ai < (1 - t)yi) < E[exp(—All)] exp((l - t)Ayi) 

< exp(— tAy\ + 2X 2 z±). (14) 


u+6 1 nK\ 


We define E[W M ] = ^ n(£)w(£) and E [^] = E(4 Let t\ = (641n2) I f| a 
A = . We first check that with these values, we have A < 1/2: 


2 and 




++ ti < 


2E[W, 




e[w m ; 

1 + e 8 In 2 
1 — e a 


< E [W, 


Thanks to the assumption made in Theorem 1, we can find e sufficiently small such that this last 
inequlity is valid. Notice that > (1 + e) 2 nIn2. It follows from (14) that 


T(Ti < (1 - h)yi) = exp 


8^i ) 


< 2 _r h 1 + e ) 
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Since there are ( n / l ^) ( n / l ^) 5; 2 n different a with m(a) = m. a simple union bound yields that as 
n —> oo, 

P (3er : (1 — e)n/4 < m(cr) < nj 4, Y\ < (1 — ti)y\) -» 0. 


Similarly, let *| = (64 In 2) with E[W U \ = and E l W v] ~ Y,e v{l)w 2 {i). 

Then 


P (Elcr : (1 - e)n/4 < m(a) < n/4, Y 2 > (1 + * 2 ) 2 / 2 ) ->• 0. 

With e sufficiently small, a.a.s. 

Yi — Y 2 > (1 — *i)yi - (1 + * 2 ) 2/2 
2m 


= 2/1 - 2/2 


> 


2m 


n 


n/2 - m)\j (64lii2) (^[W 2 ] + V®[W 2 ]) 
n/2 - m) ( aE[W M ] - 6E[W„] - ^^^(128ln2)y/(aE[W 2 ] + ME[W 2 ]) 


which is larger than zero as soon as e is sufficiently small and (5) is satisfied. 

By Cauchy-Schwartz inequality, 

| y^(a/x(£) — bv(£))w(£) J < 2 + bv(£))w 2 (£) 

\ t ) 1 

with equality achieved when w[€) = • This completes the proof. 

4.2 Proof of Theorem 2 

Without loss of generality, assume (13) holds for a*. Let Y* = a*(a*) T . By the optimality of Tsdp, 
0 < (W, Tsdp) - {W,Y*) = (E[W],Lsdp - Y*) + {W - E[W},Y SDP - Y*). (15) 

Since E [W] = ^J + -Y* — with a,/3 defined in (9), and Tsdp is a feasible solution to (6), 


(E[w],y S DP - y*) = ~{y*,y S bp -y*) - — (j, y*) < -(y*,y SD p - y*>, 

n n n 

where the last inequality holds because (J, Y*) > 0. In view of (15), it follows that 

f^(Y*,Y*-Y SD P ) < (w-E[w],y SDP - y*>. 

Notice that 

\\y* - TsdpIIf = ||y*||| + ||y S Dp||| - 2(y*,isDP> < 2 (n 2 - (y*,y SD p)) = 2 (y*,y* - y SD p). 

It follows from (16) that 

P 


(16) 


2 n 


y* - y SDP 11 i < {w - e [ w ], y SDP - y *) < \ {w - e [w] , y SDP ) I +1 (w - e [ w], y *> \. (17) 
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To upper bound \{W — E[W],y*)|, Notice that 

(W ~ nW],Y*) = 2 ^ Y?j (Wn - E [Wij ]). 

i<j 

Let cr 2 = Yli<j var[Wjj] = (1 + o( 1))| w 2 (£)(an(l) + bv(l)). By the Bernstein inequality given 

in Theorem 8, for any t > 0, 


'Ev j (w ij -nw ij ]) 

i<j 


> V2a 2 t H— 1}< 2e *. 
3 


Letting t = logn = o(a 2 ), it follows that with probability at least 1 — 2n , 


| ^2 Y *j ( w ij - nWij}) | < (1 + o(l)) nlogn^2w 2 (£){an(£) + bv(i)), 

i<j Y ^ 

and thus KIT — E[W], y*)| < (2 + o(l))^nlognj^i w 2 (£)(aji(£) + bu{£)) with probability at least 
1 — 2n _1 . 

We bound \{W — E[W], Ysdp)| next. It follows from Grothendieck’s inequality [15, Theorem 
3.4] that 

|(W-E[IT],y S DP>| < sup |<W-E[IT],y)| < Ag||W-E[IT]|| 00 ^i, 
yho,dia g {y}=i 

where I\q is an absolute constant known as Grothendieck constant and it is known that Kq < 

21n(lW2) ^ L783 - M ° reOVe L 

\\W-E[W}\\oo-n = sup ||(iy-E[IT])x||i = sup x T (W - E[W])y 

®:|N[oo<l a;,j/e{± 1}™ 

= sup y, (- E[Wjj]) (xiyj + Xjyi). 
x,ye{±l} n 

For any fixed i,i/£ {±l} n , using the Bernstein inequality, we have for any t > 0, 


y ( W^ - E [W^]) ( xiyj + Xjiji) > V8a 2 t + -t > < e t . 


K] 


Hence, for arbitrarily small constant e > 0, with probability at least 2 2 ( 1 + e ) n ; 

y (' W^ - E[Wjj]) (Xiyj + XjUi) < n ^8 In 2(1 + e) y w 2 (£){an{l) + bv{t)) + 81n2 C L + e ) 


( a ) 4 n 
~ ~3~1 


18 In 2(1 + e)y w 2 (l)(a^,(£) + bv(i)), 


where (a) follows from the technical assumption u/ 2 (£)(a^(£) + bv(£)) > 8 In2. It follows from 
the union bound that with probability at least 1 — 4~ en , 


An 


W-nW]^! < /81n2(l + f)y 2 (<)(o/i(f) + bu{£)). 
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In view of (IT), with probability at least 1 — 4 en — 2n 1 , 
1 


\Y* _ 




^sdpIIf < (! + °( 1 )) 8 |f y/ 81n2(1 + e ) + M^)) 

( “ } M , onoo /T O/T , \ + MI)) 

< (1 + o(l))32i/m 2(1 + e) =- ——— - 

L<® W ¥ W - 


( 6 ) 1 
~ 1 _ 6 16’ 


(18) 


where (a) follows by V2I\q < 3 and the definition of f5 given in (9); ( b ) holds by invoking (8) and 
letting e be sufficiently small. 

Recall that y is an eigenvector of Ysdp corresponding to the largest eigenvalue and ||y|| = sjn. 
By Davis-Kahan sin# theorem stated in Lemma 5, 

1 . rM * „ „ * , m ^ 2^/2||Y S DP-m ^ 2 v^||Y S dp - Y*\\ F 

—7= min{ <7 -y , <r + y||}<- < -• 

\ n n n 


Note that for any x E M n , Hamming distance d(a*, sign(x)) < ||<r* — x|| 2 . It follows that 

1 • ;j/ * ■ / u ,/ * ■ / \\\ ^ 8||^SDP-R1|f 

-min {<% , sign(y)), d(a ,sign(-y))} < — 


n 




and the theorem holds in view of (18). 


4.3 Proof of Theorem 3 

Recall that W' is the weighted adjacency matrix after removal of edges incident to vertices with 
high degrees and D' = W' — ^J. Define D' as the best rank-1 approximation of D' such that 
D l = vixx^ with ||cc|| = 1. Recall that E[Z)|cr] = ^cra T . Applying Davis-Kahan sin# theorem 
restated in Lemma 5 with D' and E[D|<r] gives: 


cr 


min 


— x\ 


n 


° +x\\)< 2 ^-\\D'-¥.[DW]\\. 
n \B 


Since Hamming distance d(a, sign(x) < ||cr — ^Jnx\\ 2 , it follows that 

~ min{d(cr, sign(.r), d(a, —sign(.r))} < ||Z4' — 3E[Z4|cr]|| 2 = jp\\W' - E[IK|o-]|| 2 . (19) 


Lemma 6 implies that a.a.s. \\W' — E[W|<t]|| < C\/a + b for some universal positive constant C. 
Hence, in view of (19), we get 


— min{d(<r, sign(x), d(a, — sign(x))} < 8C 2 


a + b 


and the theorem follows. 
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4.4 Proof of Theorem 4 


The proof follows the same steps as for Theorem 3, except that we are able to strengthen Lemma 
6 thanks to a result of Vu [30] . Note that the variance of the elements of W is upper bounded by 
n J2e w2 (^) ( a / i (^) + bu(l)) so that by Theorem 1.4 in [30], we get 

Lemma 1 . Under the conditions of Theorem 4, we have 


\W-E[W\a]\\ <2 \^ j w 2 {t,{ap{t) + bv{£)) 


a.a.s. 


4.5 Proof of Theorem 5 

Consider a Galton-Watson tree T with Poisson offspring distribution with mean g ^-. The type of 
the root p is chosen from {±1} uniformly at random. Each child has the same type as its parent 
with probability and a different type with probability ^5 . Every edge (u, v ) is labeled at 
random with distribution p, if a u = <j v and v otherwise. Let Tr denote the Galton-Watson tree T 
up to depth R and OTr denote the set of leaves of Tr. Let Gr denote the subgraph of G induced 
by vertices up to distance R from p and 3Gr be the set of vertices at distance R from p. 

The following lemma similar to Proposition 4.2 in [22] establishes a coupling between the local 
neighborhood of p and the labeled Galton-Watson tree rooted at p. 

Lemma 2. Let R = R{n) = L io]og( 2 («+b)) J > then there exists a coupling such that a.a.s. 

( G r ,L Gr ,(t Gr ) = (T r ,L Tr ,<tt r ), 

where Lg r and og r denote the labels and types on the subgraph Gr, respectively. 

Proof. See proof in Section C. □ 

To ease notation, we omit the shorthand a.a.s. in the sequel. To prove Theorem 5, it suffices 
to show that V&r(a p \G, L,a v ) —> 1. By the law of total variance, 

Var(o-p|G, L, a v ) = Gr [Var(cr p |G, L, cr v , cr dGR j\ + \ar aaGR [E [o p \G, L, a v , cr aGR ]\. 

Hence, it further reduces to show that Var( a p \G, L, a v , ctqg r ) L 

Let R be as in Lemma 2, then Gr = o(y / n) and thus v (j Gr. Lemma 4.7 in [22] shows that a p 
is asymptotically independent with a v conditionally on ctqg r ■ Hence, 

Vav(cr p \G,L,a v ,a d G R ) ->• Var(a p \G, L, a 9GR ). 

Let G c r denote the subgraph of G induced by edges not in Gr, and L G c R denote the set of labels 
on G c r . Recall that V(Gr- 1 ) and V(G R ) denote the set of vertices in Gr-\ and G C R , respectively. 
Let S = V(Gr- 1 ) \ {p} and T = V(G R ) \ 8Gr. Then {p] U 8Gr U S U T = V(G). Notice that 
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conditional on ( G r ,Lg r ,ctqg r ), o> is independent of ( G c r ,Lg c r )• In particular, 


' Wp\GR,L GR1 (T d G R } 

Eg'I g c F{a p ,G,L,a dGR } 

K R 


Zg^Lqc IF* {G, L,adG R } 

R 

£g%,l G c (E n„, v£V(G r ):u<v (E. t EL, 


n 


v£T:u<v t uv 1 \.uGdG R ,v^T 


^g- r ,l GCr (EE Ylu, veV(G R ):u<v fiuv') TI u,veT:u<v I\ u edG R ,veT 

(a) ~\.u,v€V(G R ):u<v & uv ^ ^jG^,L g c Y\u,v£T:u<v &uv Y\u£dG R ,v£T 


±±u,vtzv {^ R ):u<^u < j ± ±u,v^i :u<^ 

(Eorp E <ys 11 u,v£V(Gr):u<v fiuvj E G ^ (Ecr T IT u,v£T: 


:u<v 4>uv II U £dG R ,veT &UV 


Etrg ri(u,i;)g\/(Gi{):u<i; 

Yla p Ylas rT(u,i?)gG R :u<v 

rT(u,i;)sy(Gi{):u<i) fiuv'j (^Y/cr T Y\u,v£T:u<v &UV Wu£dG R ,v£T 

Ylag n(u,i;)£V A (Gi{):u<D -/r 7 / Y\u,v£T:u<v ^uv Y\u£dG R ,v£T ( f )l 

— ^ { a pi jo a dG R } _ p ! n \n t \ 

pm T \ iF {<J p \G, L, <Tqg r s j 

JF {Lr, L, CTQG R f 

where (a) holds because Ya^ F[(u v )eV(G R )-u<v ( l )uv does not depend on G C R and Lq c r - It follows that 


Lemma 2 implies that 


Yav(a p \G,L,a d G R ) = Y&r(a p \G R , L Gr , ct 9G J. 


Vai(a p \G R ,L GR ,(TdG R ) Var(cr p |r R , L Tr , (T9 t r )- 

For the labeled Galton-Watson tree, it was shown in [16] that if r < 1, the types of the leaves 
provide no information about the type of the root when the depth R —> oo, i.e., 

lF(o> = +1|F, L, ctqt r ) —^ ^’ 

Hence, Var (a p \T R , Lt r , (?dT R ) ~> 1 and the theorem follows. 

4.6 Proof of Theorem 6 

We introduce some necessary notations. For a graph G with n vertices and labeled edges, denote a 
/^-sequence of labels by [£}k = (£i,£ 2 , ■ ■ ■ ,£k) £ E. A cycle in G is called a fc-cycle with labels [£]&, 
if starting from the vertex with the minimum index and ending at its neighbor with the smaller 
index among its two neighbors, the sequence of labels on edges is given by [£\k- Let X n ([£]p) denote 
the number of /e-cycles with labels [£]*. in G. Let (X)j = X(X — 1) • • • (A — j + 1) for integers X 
and 1 < j < X. Then ( X n {[£]k))j is the number of ordered j-tuples of /c-cycles with labels [£]k in 
G. The product fi r, is assumed to taken over all possible sequences of labels with length k. The 
following lemma gives the asymptotic distribution of the number of fc-cycles with labels [£\k- 
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Lemma 3. For any fixed integer m > 3, {X n {[£] k ) : \l\ k € -C fc }fcL 3 jointly converge to independent 
Poisson random variables with mean A([£]fc) under graph distribution F' n , and C([^]fc) under graph 
distribution F n , where 

k 

X([£]k) = nw« + M4))> 

i =1 

( k k 

n («/*(£) + ^(^))+ii( a ^(^) _ 6i/ (^)) 

i=l i=l 

We are ready to prove Theorem 6. The first part of Theorem 6 is proved using Lemma 3 and 
Chebyshev inequality. Define rj([£\ k ) = t([£]k) / X{[£] k ) - 1 and X k = Y,[i] k X{[£] k )r]([t\ k ). Then, by 
Lemma 3, as n —> oo, 



E P ,ix k ] = j2m k )vmk), 

Mfc 

E P [x k ] = Y x([£}kH[W + v([£]k)). 


Mfc 


Note that 


opST' vm )n 2 (\f] dVIT ( a h(£s) - bv(I s )) 2 _ / (ap(£) - bis(i)) 2 \ 

2 k Y l] 1 ' irS - (g 


= T k . 


Therefore, 


and 


E P pf fc ] - E F [x fc ] = ^ mkW(m = r k /{ 2 k ), 

Mfe 


Var r [X fc ] = ^ A([^] fc )?? 2 ([^) = r fc /(2fc), 
Mk 

Var P [X fc ] = 5^^(Mfe)r? 2 ([^]fe) < r fc /fc. 


( 20 ) 


Choose p = T k /(6k). By Chebyshev’s inequality, 

r{x fc > E F [x fc ] + p} < Varp ' 2 [Xfc] = 

pZ T L 

Let k increases with n sufficiently slowly. Then since r > 1, X k < Ep/ [X k ] + p P'-a.a.s.. Similarly, 
X k > Ep [X k ] — p P-a.a.s.. By definition of p, Ep [X k ]—p > Ep/[Xfc]+p. Set A n = {X k < Ep/[Xfc]+p}, 
then P '(A n ) — > 1 and P(7l n ) —> 0. 

The second part of Theorem 6 is proved using the following small subgraph conditioning theo¬ 
rem, which is adapted from [17, Theorem 9.12]. 

Theorem 7. Let Y n = Ip-. If P n and P' are absolutely contiguous for any fixed n, and 

1. For each fixed rn > 3, {X n {[£] k )}'^’ = 2 t converge jointly to independent Poisson variables with 
means A([f]fc) > 0 under distribution F' n , and £([£]&) under distribution P n ; 
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2 - T,k>sT,[e] k H[t]kM[£}k) 2 < oo; 

3- ^P' n [Y 2 } -> exp(E fe > 3 E Mfc KV\kW([£]k)) asn^ oo, 

Then, P n and ¥' n are contiguous. 

In this paper, P n and P^ are discrete distributions on the space of labeled graphs, and for any 
fixed n, P n and P^ are absolutely continuous. Condition 1) is verified by Lemma 3. Condition 2) 
holds because in view of (20), 


^>3 [f\ k 


k> 3 


r k log(l — r) + r + t 2 /2 
2k ~ 2 


< oo. 


We are left to verify condition 3). By definition, 


Y n (G,L) = 2~ n Y II W u , v (G,L,a), 

crG{d=l} n ( u,v):u<v 


where 


W uv (G,L,a) = 


au(e)+bl(e) ^ au ~ ( Ui e e (G),L uv — £., 
af i(£)+bl(e) if a n + °v, («, v) € E(G),L UV = £, 

1 _(a+6)7(2n) if U ) £ ^( G )> 

l-(a+6)/(2n) if ^ V ) £ ^( G )> 


Then, 


Y 2 = 2~ 2n Y n W U)t ,(G,L,«7)W U)t ,(G,L,«S). 

o',<5E{=hl} n ( u,v):u<v 


Lemma 4. For any fixed (7,(5 6 {±l} n , if cr u a v = 5 U 5 V , then 

Ef„ [W UjV (G, L, a)W u , v (G , L, 6)] = 1 + r/n + (a - 6) 2 /(4n 2 ) + O^" 3 ). 

Otherwise, 

E p/n [W UjV (G,L,a)W UtV {G,L,S)} = 1 - r/n - (a - 6) 2 /(4n 2 ) + 0(n~ 3 ). 
Proof. Suppose a u cr v = 5 U 5 V = 1. Then, 

Ep' [W u ,v(G,L, a)W u , v (G, L,6)} 


E 

e 

1 


2 a/j,(£) \ 2 aji(£) + bu(£) 


a/j,(£) + bv(£) 

2a 2 /.i 2 (£) 
n a/j,(£) + bv(£) 


E 


2 n 


+ u -“' 2 

n. 


+ 


1 — a/n 


1 + 


1 - (a + b)/(2n) 
a + b (a + b) 2 


2 n 


+ 


1 - 


+ 0(n~ 3 ) 


a + b 
2 n 


An 2 


1 \ - 

i+ y 

n 4 —t 

t 

1 + - Y 

71 < 


2a 2 n 2 (£) MO - 3a//(l) \ (a - 6) 2 _ 3 * 

) An 2 


n \afi(£) + bu(£) ' 2 

(afi(£) — bu(£)) 2 (a — b) 2 


i 

n 2{an{£) + bv(£)) ' An 2 


+ 


+ 0(n 


-3^ 


( 21 ) 


( 22 ) 


= 1 + r/n + (a — b) 2 /(An 2 ) + 0(n 3 ). 


(23) 
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By symmetry, (23) holds for a u a v = 5 U 5 V = — 1. Suppose cr u = a v and 5 U ^ S v . Then, 


E P > n [W UiV (G,L,a)W u , v (G,L,S)] 


E 


Aabp(£)v{£) ap(£) + bv(€) 

( ap {£) + bv(£)) 2 2 n 


+ 


(1 — a/n)(l — b/n) ( 
(1 — (a + b)/(2n)) 2 \ 



(ap(£) — bu{£)) 2 
2 (ap(£) + bu(£)) 


(a ~ b ) 2 

4n 2 


+ 0(n~ 3 ) 


= 1 — r/n — (a — 6) 2 /(4n 2 ) + 0(n 3 ). 



□ 


In view of Lemma 4, letting S(a,5 ) = {(«, u) : u < v,a u a v = d u <5„} and T(a,5) = {(it, u) : u < 
v, a u a v / 5?A}, and 7 n = r/n + (a — ft) 2 /(4n 2 ) + 0(n -3 ), it follows from (22) that 

Ep, [Y 2 ] = 2 _2n £ (1 + 7n) |SM)l (1 - 7n) IW)l . (24) 

<t,6e{± l} n 

Define p(a, 5) = (a, 5} and then IS/cr, 5)| = (n 2 + p 2 )/4 — n/2 and | T(a, 5)| = (n 2 — p 2 )/ 4. It follows 
from (24) that 

E P , [ 7 ] =(l+7nr 1/W2 (l-7„)" ,/4 2- 2 ” ^ (l+7„)' ,2/4 (l-7„)^ S/4 . (25) 

cr,<5e{±l} n 


Taylor expansion yields 

(1 + 7 n) n /4 ~ n/2 (1 - 7 n ) n /4 = (l + 0{nT 1 )) exp [-t 2 /4 - r/2] 


(1 + 7 n ) p /4 (1 - 7 n) p /4 = exp 


^(r/ 2 + 0 (n- 1 )) 
n 


Combing (25) and (26), we get that 

Ep; [ Y n] = (! + 0(n -1 )) exp [-t 2 /4 - r/2] E 


^( t / 2 +0(n- 1 )) 


(26) 


(27) 


where Z n = -4=(cr, 5) and <r, 5 are independently and uniformly distributed over {±l} n . Let Z 
denote a standard Gaussian random variable. Then central limit theorem implies that Z n converges 
to Z in distribution. Since x —> exp(x 2 r/2) is a continuous mapping, exp(Z 2 r/2) converges to 
exp(Z 2 r/2) in distribution. Moreover, {exp(Z 2 r/2)} are uniformly bounded in Li +e norm for 
some e > 0 and thus uniformly integrable. In particular, 


poo pC 

E [exp((l + e)Z 2 r/ 2)] = / P {exp((l + e)Z 2 r/ 2) > t} dt = 

Jo Jo 


Z n > 


(a) 


' 2 In t 

(1 + e)r 
(b) 


dt 


t ( 1 + £ ) r dt < oo 


where (a) follows from the Hoeffding’s inequality P {Z n > t} < exp(—1 2 /2); ( 6 ) holds by choosing e 
sufficiently small such that (l + e)r < 1. Hence, E[exp(Z 2 r/2)] converges to E[exp(Z 2 r/2)] = ^=. 
It follows from (27) that when r < 1, as n —> 00 , 

exr> —T / 2—r2 / 4 

ep-.K’ p 


y/l-r 


Hence, in view of (21), condition 3) of Theorem 7 holds and the second part of Theorem 6 follows 
from Theorem 7. 
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5 Conclusion 


Our results show that when r < 1 it is fundamentally impossible to give a positively correlated 
reconstruction; when r is large enough, the labeling information can be effectively exploited through 
the suitably weighted graph. An interesting future work is to prove the positive part of Conjecture 
1. 
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A Special case of Davis-Kahan sin 9 Theorem 

The following lemma is Davis-Kahan sin 6 theorem [8] specialized to the rank-1 setting. For 
completeness, we restate the theorem and provide a proof. 

Lemma 5. Let M = axx T and M' = f3yy T , with a,/3 G R, ||x|| = ||y|| = 1 and x T y > 0. Then 


\x - y || < 


V2 


-\\M — M'\ 


max{|o:|, |/3|} 

Furthermore, if AT' is the best rank-1 approximation of AT, then 

2^2 


\x - y || < 


max{|a|, |/3|} 




Proof. First dehne 9 e [0, vr/2] as x T y = cos 0 > 0. Hence we have ||x — y|| = 2sin^. Moreover 
a simple calculation shows that min 7 gR ||a? — 'T 2 /II = sin 6 and moreover for 9 6 [0, 7 r/ 2 ], we have 
\/ 2 sin | < sin#. Hence we get ||x — y\\ < \/2 min 7 ||x — jy\\. Taking 7 = ^ y T x , then gives 


\\x — y\\ < V2\\x — — yy T x\\ = —^||(M — M , )a;|| < ^||M — M , ||. 

a |a| |ck| 

By symmetry, the first part of the lemma is proved. The second part of the lemma follows from 
the fact that 


|| M - M' || < ||M - M\\ + || M - AT || < 2|| M - M ||, 

where the last inequality holds because A1 is of rank 1 and M' is the best rank-1 approximation of 
M. □ 

B Spectrum of Sparse Labeled Stochastic Block Model 

Lemma 6. Assume a > b > Cq for some sufficiently large constant Cq. There exists some absolute 
constant C such that conditional on a, 

\\W' -E[W\a]\\ < CVa + b, a.a.s. 

For the special case of Erdos-Renyi random graph, i.e., w(£) = 1 for all t and 0 = 6 , Lemma 6 
is proved in [11], Our analysis is very similar to that given in [11] with small technical differences 
due to the edge weights. We provide a formal proof below for completeness. 
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Proof. Define V be the (random) set of vertices remained and V c denote the set of vertices removed. 
For every vertex, its degree is distributed as Binom (n — 1, • It is shown by [7] [Lemma 39] that 

there exists a constant C\ > 0 such that a.a.s. |V C | < nexp (—Ci(a + b )). To prove the lemma, it 
suffices to show \x T {W — E[VF|er])x| = 0(\/a + b ) for all x such that ||x ||2 = 1. The proof ideas 
borrow from [13, 11, 18] and consists of three steps: 

1. Reduce the problem by proving the same bound for x belonging to a discrete grid. 

2. For the discrete grid, bound the contribution of light pairs (defined below) by applying a 
union bound and a large deviation estimate. 

3. Bound the contribution of heavy pairs using the bounded degree and the discrepancy prop¬ 
erties (defined below) . 


B.l Reduction to a discrete grid 


For any 0 < e < 1, define a grid T e which approximates the unit sphere S n 1 


T = 




{x : lx || 


I} : 


For every point x G S'” , there exists some point y G T e such that ||x — y|| < e. Therefore, T e is 

an e-net of £ n . Moreover, the hypercubes of side length e/y/n centered at the points in 7/ are 
disjoint. On the other hand, all such hypercubes lie in the ball of radius (1 + e/2) centered at the 
origin. Since the volume of a unit ball is 1+ LT 


T e | < 


1 + 0 ( 1 ) 





exp n 




+ -log(2vr) +o(l) 


(28) 


Lemma 5.4 in [29] implies that 

\\W' - E [W\<j] || = sup \x T (W' -K[W\a])x\ < (1 — 2e) _1 sup \x T {W' -E[W\a})x 
x£S n - 1 x&Tt 


Choosing e = ^, we have \\W' — E [W|er ]|| 2 < 2 sup xeTi/i \ xT ( w — ^ [VF|ct])x|. Hence, it suffices to 
bound supj.^ \x T (W — E [W])x|. 


B.2 Bounding the contribution of light pairs 

Given an x G T/ 4 ; directly applying the concentration inequality to x T {W' — E[W|<r])x, such as 
Bernstein’s inequality, does not give the desired result. Define the set of light pairs L x = {(u,v) : 
u < v, \x u x v \ < v/ ° +6 } and the set of heavy pairs H x = {(u, v) : u < v}\ L x . Observe that 


sup \x T (W' — E[W cr])x| < sup 

^ x u W' uv x v — x t E[1F \a)x 

+ sup 

x u W' uv x v 


(■ u,v)eL x 

x ^Ti/4 

( u,v)eH x 


We bound the contribution of heavy pairs separately in the next subsection. Recall that V denote 
the set of vertices remained. Given V = V, define W ] by setting to zero the rows and columns of 
W corresponding to vertices removed, and define the event 


E(V) 


! sup 

^ x u W^ v x v — x t E[1F \a\x 

> CVa + b\ 

1 x S 7 i /4 

(■ u,v)eL x 

1 
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Then 


, sup 

Y, x uW' uv x v - x T E[W\a]x 

> CV^+b) 

1 x ^l 1/4 

(■ u,v)eL x 

J 


> CVa + b } = P{£(V)} < 2 n maxP {E{V)}. (29) 


Lemma 7 below, together with a union bound over all possible points x E 7)/4 and (28), implies 
that for any positive constant C' 2 , there exists a constant C > 0 such that P{_E(y)} < exp(— C 2 n). 
In view of (29) and a union bound, we conclude that P{I?(V)} is exponentially small by choosing 
C large enough. 

Lemma 7. Fix x E T\/\ and V to be the set of vertices remaind. Define W l by setting to zero 
the rows and columns of W corresponding to vertices removed. Let X = Yl(uv)eL x x uW^ v x v — 
£ T IE [IT | ct]x. Assume a > b > Co for some sufficiently large constant Co- Then |E[X]| < 2 \fa + b 
and for any constant C 2 > 0 , there exists some constant C 3 > 0 such that 


’ j|A — E[A”]| > C^Va + 61 < exp(—C^n). 


Proof. Note that 


E[A] = Y a + f' * U(TV l { u , v eY}X u x v - x T E[W\a]y, 


( u,v)eL x 


n 


E 


a + fia u a % 


{u,v)&H x 

Since a, ft < a + 6 , it follows that 
a + b 


n 


m < 


Xu^v ^ ' 


a + b 


a + /3a u a v 


n 


(1 ^-{u,v£V}) x u Xx 


n Y \ X nVv\ + 

(u,v)£H x utfiV v u vfV 


(30) 


Notice that \H x \ g ^£ < Yl(ij)eH x x i x j — T Thus \H X \ < and by Cauchy-Schwartz inequality, 

Y \ X n x v\ < \H X \ 1/2 ( Y X l X l) < 


(u,v)£H x 

Again by Cauchy-Schwartz inequality, 


,(u,v)eH x 


n 

y/a + b 


^ |a; u |^|^|+ ^|x u | ^ |x^| < 2 (n|I / c |) 1/2 I Y x u x l \ < 2 (n|D c |) 1/2 < 2 ne c fi a + b )/ 2 : 
u(£V v u vfV \u(E:[n],v(£V J 

where the last inequality follows because a.a.s. |V C | < nexp {—C\{a + b)). It follows from (30) that 

|E[A]| <VaTb + 2(a + b)e~ Cl ( a+b)/2 < 2 Va + b, 

where the last inequality holds when a > Cq for a sufficiently large constant Cq. 

Below we bound \X — E[A]| using the Bernstein inequality. Define 


X U v I Vuv%u%v^-{(u,v)£L x }^-{u,v£V} • 
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Then X - E[X] = 2J2 U<V (X uv - E[X ut ,]) . Note that \X UV - E[X ttW ]| < ^ and var(X TO ) < 
x ‘u x ‘v g TT- Therefore, var(X) < g Ah J2 uv x u x v = ^r- It follows from the Bernstein inequality that 
for any positive universal constant C 2 > 0, 

P j|X —E[X]| < ^2 C 2 (a + b) + <e~ C2n . 

□ 


B.3 Bounding the contribution of heavy pairs 

For the set of heavy pairs, since w{£) E [—1,1], it follows that 

sup | E XuW uv x v\ E sup ^ ^ \x u y v \A uv , (31) 

xeTl / 4 ( u,v)£H x xGTl / 4 (u,v)£H 

where A' is defined by setting to zero the rows and columns of A corresponding to vertices removed. 
We upper bound (31) by showing that the graph G' with the adjacency matrix given by A 1 satisfy 
the following two properties. 

Definition 2 (Bounded degree of order (d, C 4 )). A graph is said to have bounded degree 
property of order (d, C 4 ) if every vertex has a degree bounded by C 4 d for some universal constant 
C 4 > 1. 

Definition 3 (Discrepancy of order (d, C 5 ,cg)). A graph is said to have discrepancy property 
of order (d, c$,cq) if for every S, T C [n] with \T\ >151, one of the following holds: 

1. e(5,T)<c 5 ^|5||T|. 

2. e(5,T)log( f §§f) <c 6 |5|log^, 

where e(S, T ) denotes the set of edges between vertices in S and vertices in T. 

Thanks to removal of edges incident to vertices with degree larger than | Q' satisfy the 
bound degree property of order §)• In the case with \T\ > then 

t a / IC 4 3 (« + & ) / 3e ( a + 6 )icn^l 

e(5,T) < |5|- - -< — — — \S\\T\, 

where the first inequality follows from the bounded degree property. Therefore, G' satisfy the 
discrepancy property with d = a + b and C 5 = 3/4. 

In the case with |Tj < let G denote an Erdos-Renyi random graph with n vertices and edge 
probability (a + b)/n\ there exists a coupling such that if (u,v) E E(G), then (u,v) E E[G). It 
is shown in [ 11 , Section 2.2.5] that with probability at least 1 — 1/n, G satisfies the discrepancy 
property of order (o + b, C 5 , cq) for some constants C 5 and cq. Since removal of edges only decreases 
e(S,T), G' also satisfies the discrepancy property of order (a + b, C 5 ,C 6 ) with probability at least 
1 — 1 /n. 

Applying [Corollary 2.11] [11], we conclude that there exists some constant C such that 
P < sup ^ \xiXj\A' uv < CVa + b > > 1 — — 

[ XeT ^(u,v)GH x J n 

The conclusion follows in view of (29) and (31). □ 


24 










C Proof of Lemma 2 


We introduce some necessary notations for the labeled tree T. For a vertex v £ T, let Y v denote 
the number of children of v. Let Y= denote the number of children of v with the same type as v 
and Y,f = Y v — Y~. By Poisson splitting property, Y= and Y-f are independent Poisson random 
variables with mean a/2 and 6/2, respectively. Let Yj denote the number of children of v with the 
edge connected to v being labeled with l. Let Y ( 7^ denote the number of children of v with the same 
type as v and the edge connected to v being labeled with i and = Y% — Y.,7 J . Then YX’ £ and 
Yf 4 are independent Poisson random variables with mean (afx(£)/ 2) and (bv(£)/2), respectively. 

Similarly introduce the corresponding notations for Gr. Let V(Gr) denote the set of vertices 
of Gr and Vr = V \ V{Gr). Let V ^" 1 denote the vertices of type +1 in Vr and similarly for 
Vft 1 . For a vertex v E BGr, let X v denote the number of children of v in Vr and X= denote 
the number of children of v in Vr with the same type as v. Let xt = X v — X=. Then, X= ~ 
Binom(| V^ v |, a/n ) and xf ~ Binom(|V^' <T,, |, b/n). Let Xf, denote the number of children of v in Vr 
with edge connected to v being labeled with l. Let Xy’ e denote the number of children of v in Vr 
with the same type as v and the edge connected to v being labeled with £ and xf' ( = X% — Xy ,E . 
Then, Xy^ ~ Binom(|V^ l ’| ,an(£)/n) and xt' £ ~ Binom(|V^" c7, '|, bu(£)/n). Note that it is possible 
to have u, v E 3Gr which share the same child in Vr and thus Gr may not be a tree. The goal is 
to show that such events are rare. 

In particular, for any integer 1 < r < R, let A r denote the event that no vertex in V r has more 
than one parent in G r . Let B r denote the event that there are no edges within dG r . Define an 
event C r as 


C r = {|<9G' S | < 2 s (a + b) s logn for all 0 < s < r}, 

which is useful to establish that V r is large enough so that the binomial distribution is close to 
Poisson distribution. Lemma 4.4 and 4.5 in [22] show that for any r < R, 

F n (A r \C r ,a) > 1 -0(n" 3 / 4 ), 

F n (B r \C r ,a) >1 -0(n“ 3 / 4 ), 

F n (C r+1 \C r ,a)>l-n- lo ^ e \ (32) 


and |G r | = 0(n l / % ) on C r . 

We are ready to prove the proposition. Let V +1 and P _1 denote the set of vertices in V 
with type +1 and —1, respectively. Then a.a.s. ||Y +1 | — |P _1 || < n 3 / 4 in view of (13). Suppose 
that (G r , Lc r , &G r ) = (T r , Lx r . gr .) and C r holds. By (32), the event A r , B r and C r +1 hold 
simultaneously with probability at least 1 — 0(n _1//8 ) and |G r | = 0(n 4 / 8 ). Note that if further 
Xy’ e = YX' { and xf^ = Y,t^' for every v E dG r and every £ E C, then (G r+ i, Lc r+1 , (?G r+1 ) = 
(T r+ i,L Tr 

+ 1 ’ a T r+1 ) • 

For each v E dG r , XB' 1 ~ Binom(|V/ T,, |, a/x(l)/n), and 

n/2 + n 3/4 > \V av \ > \Vf v \ > \V ffv \ - \G r \ > n/2 - ?r 3/4 - 0(n 1/8 ). 

Lemma 4.6 in [22] bounds total variation distance between binomial and Poisson random variables 
as 

||Binom (m, - Poi S (c)|| TV = O ]/ ~ " l} ) . 
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Therefore, for any fixed v e dG r and t E £, X v ,£ can be coupled with Y v ,£ such that P{X„ ’ £ / 
Y v - J } = 0(n -1 / 4 ) and similarly for xf J: . Since \dG r \ = 0( n 1 / 8 ) and £ is a finite set, the union 
bound concludes that Xv' ( ' = YX’ £ and xt’ 1 = Yjf’ 1 ' for every v £ dG r and every t E £ with 
probability at least 1 — 0(n -1 / 8 ), Therefore 


E {(G r+ i,LG r+1 ,aG r+1 ) = (T r+1 ,L Tr+1 ,<j Tr+1 ),C r+ i\(G ri LG r ,aG r ) = (T r , L Tr ,aT r ),C r } > 1 - O(ras). 


By definition of condition probability, 


E{(G r+1 ,L Gr+1 ,aG r+1 ) = (T r+1 ,L Tr+1 ,a Tr+1 ),C r+1 } 

> (l-0(n- 1 / 8 ))p{(G r ,L Gr ,a Gr ) = (T r , L Tr , a T J, C r }. (33) 

Since P(Co) = 1, and and Tr starts at the same root p, the proposition follows by recursively 
applying (33). 


D Proof of Lemma 3 


First consider the graph distribution P^. By the method of moments (see Theorem 6.10 [17]), it 
suffices to show that under P' n , 


E 


m 


k=l [t\ k 


m 


->nnww*)y (i H 

k =i [e\k 


(34) 


for all possible non-negative integers {J([£]fc)}- We first show that for any fixed [£]k, E[X n ([£]*.)] —> 

m^. 

Let vo,, Vk-i be k distinct vertices among n vertices. Let I be the indicator that vq, ..., v^-i 
is a fc-cycle with labels [£\k- Then, 


Q£(4) + bv( 4) 

S =1 ^ 

By the linearity of expectation, 


E(X n ((i] k )} ( = } 



(fc-D 

2 


-E[I] 


fn\ (.k - 1)! -pj ap(l s ) + bv(£ s ) 

\k) 2 AA 2 n 

S=1 


(35) 


where (a) holds because there are ()[) different choices of vo, ■ ■ ■, Vk-i and k\ different permutations 
of them; each cycle corresponds to 2k different permutations. Therefore, E[X n ([£]/;)] —> A([£]fe) as 
n —> oo as long as k = o(y / n). 

Then, we argue that E[(X n ([^]fe))j] — > (A([£]^))- 7 . Note that (X n ([£]k))j is the number of ordered 
.j-tuples of fc-cycles with labels [£}k in G. Divide these j-tuples into two sets: A is the set of j-tuples 
for which all of the A:-cycles are disjoint, and B is the set of the rest of the j-tuples. 

Take (Gi, C2, ■ ■ ■, Cj) E A. Since G*’s are disjoint, they appear independently. By the previous 
argument, it follows that the cycles C \,... ,Cj are all present in G with probability 


nn 


i= 1 s=l 


ap (£ s ) + bis (£ s ) 
2 n 
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Since there are (^) elements in A, the expected number of vertex-disjoint j-tuples of fc-cycles 
with [P\k is 


n \ ( kj)l -pr -A- Qju(4) + bv(£ s ) 
kjJPkyUll 2 n 




Let I be the number of non-vertex-disjoint j-tuples. Then the distribution of I is stochastically 
dominated by the distribution of I under an Erdos-Renyi random graph Q(n, max f a,fe f ) ~ it is shown 
by [3] [Corollary 4.4] that if k = 0(log 1//4 n), then E[/] — > 0 for any Q(n,c/n) with constant c . 
Hence, E[J] -» 0 under P?. 

Finally, note that the same argument applies to any joint factorial moment corresponding to 
cycles with different lengths and labels. Thus equation (34) follows. 

Next consider the graph distribution P„. It suffices to show that under P n 


E 




k=l [t\ k 


m 


fc=i \t\ k 


(36) 


We claim that for any fixed \l\k-, E[X n ([Z]fc)] — > f{[£]k)- Let vo, ■ ■ ■, v/c-i be k distinct vertices among 
n vertices. Let I be the indicator that vq, ... is a fe-cycle with labels [£}h and vq being the 

vertex with the minimum index. Let v = uq, then E[X n ([•£]*;)] = (?) — 9 1 ^ ! E[/] and 


E[I\a V0 ,...,a Vk _ 1 ] = n k a^(4) bv(£i). 

1 <i<k 1 <i<k 


Notice that there are always even number of i such that <J Vi _ 1 7^ cr Vi . Thus, 


E[7] = E ct [E[/|cj]] = (2 n)~ k + bu^)) + jj(a/i(4) - bv^)) 


\i =1 


i —1 


Therefore, E[X n ([£]*.)] —> £([£]*,) and by the same argument as before, equation (36) holds. 


E Bernstein Inequality 

Theorem 8. Let X\,... ,X n be independent random variables such that |Xj| < M almost surely. 
Let af = var(Xj) and a 2 = ^?? =1 a i> then 

r{tx.> t }<exp( 2 - J X_). 

It follows then 

p|yx f >C2A+?)L|<e-“. 
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